Approximate geodesic distances reveal biologically relevant structures in microarray data

نویسندگان

  • Jens Nilsson
  • Thoas Fioretos
  • Mattias Höglund
  • Magnus Fontes
چکیده

MOTIVATION Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manifold Visualization via Short Walks

Visualizing low-dimensional non-linear manifolds underlying high-dimensional data is a challenging data analysis problem. Different manifold visualization methods can be characterized by the associated definitions of proximity between highdimensional data points and score functions that lead to different low-dimensional embeddings, preserving different features in the data. The geodesic distanc...

متن کامل

به کارگیری خوشه‌بندی دوبعدی با روش «زیرماتریس‌های با میانگین- درایه‌های بزرگ» در داده‌های بیان ژنی حاصل از ریزآرایه‌های DNA

Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...

متن کامل

Annotation-based Distance Measures for Patient Subgroup Discovery in Clinical Microarray Studies

MOTIVATION Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been sugge...

متن کامل

On Approximate Geodesic-Distance Queries amid Deforming Point Clouds

We propose data structures for answering a geodesic-distance query between two query points in a two-dimensional or three-dimensional dynamic environment, in which obstacles are deforming continuously. Each obstacle in the environment is modeled as the convex hull of a continuously deforming point cloud. The key to our approach is to avoid maintaining the convex hull of each point cloud explici...

متن کامل

Geodesics using Waves: Computing Distances using Wave Propagation

In this paper, we present a new method for computing approximate geodesic distances. We introduce the wave method for approximating geodesic distances from a point on a manifold mesh. Our method involves the solution of two linear systems of equations. One system of equations is solved repeatedly to propagate the wave on the entire mesh, and one system is solved once after wave propagation is c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 6  شماره 

صفحات  -

تاریخ انتشار 2004